System and method of detecting and assessing multiple types of risks related to mortgage lending

ABSTRACT

Embodiments include systems and methods of detecting and assessing multiple types of risks related to mortgage lending. One embodiment includes a system and method of detecting and assessing risks including fraud risks, early payment default risks, and risks related to fraudulently stated income on loan applications. One embodiment includes a computerized method that includes creating a combined risk detection model based on a plurality of risk detection models and using the combined risk detection model to evaluate loan application data and generate a combined risk score that takes into account interaction of different types of risks individually and collectively detected by the plurality of risk detection models.

RELATED APPLICATIONS

This application is related to U.S. patent application Ser. No.12/538,721, which is a continuation of U.S. patent application Ser. No.11/526,208, now issued as U.S. Pat. No. 7,587,348, which claims thebenefit of U.S. provisional patent application No. 60/785,902, filedMar. 24, 2006 and U.S. provisional patent application No. 60/831,788,filed on Jul. 18, 2006. Portions of the '721 application are reproducedherein. The disclosure of publications and patent applications mentionedin this specification are herein incorporated by reference to the sameextent as if each individual publication or patent application wasspecifically and individually indicated to be incorporated by reference.

BACKGROUND OF THE DISCLOSURE

1. Field of the Invention

The present disclosure relates to computer processes for detecting andassessing multiple types of risks in financial transactions.

2. Description of the Related Technology

Many financial transactions are fraught with risks. For example, amortgage lender may face risks of borrower default and fraud. A frauddetection system may be configured to analyze loan application data toidentify applications that are being submitted with fraudulentapplication data. A separate default risk detection system may beconfigured to analyze the same application data to address the risk ofborrower default.

However, existing risk detection systems have failed to keep pace withthe dynamic nature of financial transactions. Moreover, such systemshave failed to take advantage of the increased capabilities of computersystems. Thus, a need exists for improved systems and methods ofdetecting and assessing various types of risks associated with financialtransactions.

SUMMARY OF THE DISCLOSURE

The system, method, and devices disclosed herein each have severalaspects, no single one of which is solely responsible for its desirableattributes. Without limiting the scope of the various embodiments asexpressed by the claims which follow, the more prominent features of thevarious embodiments will now be discussed briefly. After consideringthis discussion, and particularly after reading the section entitled“Detailed Description of Certain Embodiments,” one will understand howthe features of the various embodiments provide advantages that includeimproved detection and assessment of risks in financial transactionssuch as mortgage transactions.

Embodiments disclosed herein provide systems and methods for detectingand assessing various types of risks associated with financialtransactions, such as transactions involved in mortgage lending.Embodiments of the risk detection and assessment system combine two ormore individual data models that are configured to detect and assessparticular types of risks into a single combined model that is bettersuited for detecting risks in the overall transactions. Variousembodiments disclosed herein combine discrete data models, each of whichmay be utilized on its own to provide a specific risk score. In oneembodiment, the data models include at least a model for detecting andassessing mortgage fraud risk, a model for detecting and assessing earlymortgage payment default risk, and a multi-component risk model fordetecting and assessing risks, with the model based primarily onanalysis of data external to a mortgage loan (e.g., analysis of propertyvalues in the local market). Other embodiments of the detection andassessment system may include additional models, e.g., a model fordetecting the presence of fraudulently reported income data.

Although the individual models may be capable of predicting individualrisks, they may only offer a partial picture of the overall risks. Froma risk management standpoint, a user of such predictive models wouldtypically stand to suffer financial losses in mortgage transactions ifany of such risks materialize. While it is theoretically possible toapply many or all of these individual models for every loan application,generate scores from all the models and review them, in practice thisbecomes burdensome on the human reviewers. Indeed, by definition a scoreis an abstraction of the risks, and the very nature of a risk score isto enable quick detection and assessment of risks without a human reviewof all the underlying data.

Therefore, in one embodiment, the combined model takes as input selectedscores output by the individual models and potentially other data,processes the selected scores and other data, and generates a singlecombined score that may reflect an overall risk of a particulartransaction. The combined model presents these risks in a comprehensivefashion and is configured to detect potentially hidden risks that mayotherwise be difficult to detect by an individual model. Additionalperformance gains of the combined model over the individual models mayinclude a reduction of false positives, an increase in the dollar amountof identified fraudulent and/or high-risk loans, and an increase in theinstances of identified fraudulent and/or high-risk loans.

In one embodiment, such a combined model may be created based onevaluating the performance of the underlying models (or sets of models)in detecting risks, including fraud and default risks. One or morecombined models may be generated by using data including a set ofhistorical transactions in which fraud and/or default outcomes areknown. Other combined models may be based on data including,test/training data, current data, real-time data, a mix of historicaldata, current data, and/or real-time data. Additionally oralternatively, the correlation between the underlying models may bemeasured, and selected features from the models may be used to create acombined model that is trained on data such as test/training data. Thefeatures selected may be based on the type of data analysis modelingstructure(s) and technique(s) chosen for the combined model. Theperformance of the resulting combined model may be evaluated against theperformance of the individual models, and adjustments to the combinedmodel may be made to further improve performance.

The combined models as described herein are especially suitable formortgage fraud and default detection because many parties are involvedin the whole mortgage origination and funding process and mortgage riskexists almost everywhere, from borrowers, to collaterals, to brokers. Bycombining results from different models having focus in differentdomains (such as borrower risk, collateral risk, broker risk, identityrisk, loan risk, etc.), the combined model(s) provide a morecomprehensive and accurate risk assessment of each loan application thanany single model alone can provide.

As disclosed herein, the term “mortgage” may include residential,commercial, or industrial mortgages. In addition, “mortgage” may includefirst, second, home equity, or any other loan associated with a realproperty. In addition, it is to be recognized that other embodiments mayalso include risk detection and assessment in other types of loans orfinancial transactions such as credit card lending and auto loanlending.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a functional block diagram illustrating a risk detection andassessment system in accordance with an embodiment.

FIG. 1B is a schematic diagram illustrating an aspect of the combinedscoring model that provides an overall risk picture of a mortgagelending transaction.

FIG. 2 is a flowchart illustrating the operation of the risk detectionand assessment system in accordance with an embodiment.

FIG. 3A is a flowchart illustrating a method of creating a combinedmodel for detecting and assessing risks in financial transactions inaccordance with an embodiment.

FIG. 3B is a flowchart illustrating a method of building a combinedmodel for detecting and assessing risks in financial transactions inaccordance with an embodiment.

FIG. 3C is a flowchart illustrating an embodiment of a method ofproviding a score indicative of risks using the combined model.

FIG. 4 is sample report showing a risk score and associated riskindicators generated by the combined model in accordance with anembodiment.

FIG. 5A is a functional block diagram illustrating the generation andexecution of one model in accordance with an embodiment.

FIG. 5B is a functional block diagram illustrating example models usedin the model of FIG. 5A.

FIG. 5C is a flowchart illustrating another embodiment of modelgeneration for use in the model of FIG. 5A.

FIG. 6A is a flowchart illustrating a supervised method of generating amodel for use in a model that is useable in an embodiment of the riskdetection and assessment system.

FIG. 6B is a flowchart illustrating an unsupervised method of generatinga model for use in a model that is useable in an embodiment of the riskdetection and assessment system.

FIG. 7 is a flowchart illustrating an example of using a model based onhistorical transactions to generate a score indicative of fraud risk foruse as part of a combined model in accordance with an embodiment.

FIG. 8 is a functional block diagram illustrating components of amulti-component risk model that is useable as part of the overallcombined model in accordance with an embodiment.

FIG. 9 is a functional block diagram illustrating the generation andexecution of another model that is useable as part of the overallcombined model in accordance with an embodiment.

FIG. 10 is a flowchart illustrating an example of using a model fordetecting fraud that is based on applicant income to generate a validitymeasure for use as part of a combined model in accordance with anembodiment.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

The following detailed description is directed to certain specificembodiments of the invention. However, the invention can be embodied ina multitude of different ways as defined and covered by the claims. Inthis description, reference is made to the drawings wherein likecomponents are designated with like numerals throughout.

Risk Detection and Assessment System Overview

FIG. 1A is a functional block diagram illustrating a risk detection andassessment system 100. In one embodiment, the risk detection andassessment system is used with a mortgage origination system 116. Inother embodiments, the risk detection and assessment system 100 may beused in evaluating mortgage applications and/or funded loans by aninvestment bank or as part of due diligence of a loan portfolio. Therisk detection and assessment system 100 may receive and store data in astorage 104. The storage 104 may comprise one or more database serversor devices, and may use any suitable configuration of volatile and/orpersistent memory. The risk detection and assessment system 100 may beconfigured to receive mortgage application data from the mortgageorigination system 116 and provide results 124 of its risk detection andassessment, via a score reporting module 126, back to the mortgageorigination system 116. In one embodiment, the risk detection andassessment system 100 uses multiple models to generate the results ofits detection and assessment of data indicative of various types ofrisks, including, for example, fraud risks and default risks. Theresults 124 may also be provided, via the score reporting module 126, toa risk manager system 118 for further processing and/or analysis by ahuman operator. The risk manager system 118 may be provided inconjunction with the risk detection and assessment system 100 or inconjunction with the mortgage origination system 116.

A model generator 106 may provide models 110 to the risk detection andassessment system 100. In one embodiment, the model generator 106provides the models periodically to the system 100, such as when newversions of the system 100 are released to a production environment. Inother embodiments, at least a portion of the model generator 106 isincluded in the system 100 and configured to automatically update atleast a portion of the models in the system 100. Each model may, forexample, be in the form a code module executed by computer hardware, andmay embody a particular risk assessment algorithm. The models 110 mayinclude one or more discrete models that are configured to assesscertain types of risks and may generate risk scores and/or riskindicators. Models such as models 111, 113, 115, and 117 are describedin further detail below, and these models may be combined together by amodel combining module 122 to create a combined model 112. The creationof the combined model is shown by the dotted arrow lines to the right ofthe models 110, and the combined model creation process will be furtherdescribed below in conjunction with FIGS. 2, 3A, and 3B. Each individualmodel may operate or be used independently to generate a score orindicator of risk. For example, the fraud detection model 111 may beused to generate an independent score that predicts presence offraudulent application data in a mortgage application. In oneembodiment, the model combining module 122 combines two or more of thesemodels (including any other suitable model(s) 119, if any) to generatescores or risk indicators. In one embodiment, the combined model 112 maybe encoded in software, such as analytical software available from theSAS Institute, Inc.

In one embodiment, once the combined model 112 is generated, when aparticular loan application to be assessed is submitted to the combinedmodel 112 in operation, the combined model 112 takes as input outputs130 (e.g. risk scores) generated by the individual models 110 and/orother data 132. Other data 132 may include loan balance data. As shown,the combined score calculation process is indicated by the dashed arrowlines to the left of the models 110, and the process will be furtherdescribed below in conjunction with FIGS. 2 and 3C. The individualmodels 110 may take as input loan data from the mortgage originationsystem 116 and/or the storage 104, credit data, property data, and otherdata from the system 116, the storage 104 and/or other sources, in orderto derive the individual score outputs 130. In one embodiment, an inputselection module 128 selects a portion of the outputs 130 and the otherdata 132 for input into the combined model 112. The combined modeloffers enhanced risk detection and assessment capabilities because it isable to evaluate the interaction of various types of risks, each ofwhich would normally be detected by a particular type of risk detectionmodel. For example, the combined model may be better suited to detectdata indicative of risks that may be undetectable by the individualmodels. In addition, false positives may be reduced in the combinedmodel as it is built upon recognizing the problematic areas of theindividual models and the various models are able to complement oneanother.

Finally, the results 124 including the calculated combined score, andoptionally the associated risk indicators, are provided through a scorereporting module 126 to the mortgage origination system 116, the riskmanager system 118, and/or other systems.

Brief Overview of the Individual Models

Embodiments of the models 110 include a fraud detection model 111 thatdetects the risk of the presence of fraudulent data in a mortgageapplication. The model 111 may be created by a variety of data,including but not limited to: data indicative of historical transactionsand, optionally, data related to historical transactions of entities(e.g. brokers, appraisers, etc.) other than the subject loanapplication. One suitable embodiment of the model 111 is described inU.S. Pat. No. 7,587,348 entitled “SYSTEM AND METHOD OF DETECTINGMORTGAGE RELATED FRAUD,” which is owned by the assignee of the presentapplication and the disclosure of which is hereby incorporated byreference. Portions of the '348 patent are also reproduced herein.Further details of one embodiment of the model(s) 111 are describedbelow in the section entitled “Fraud Detection Model.” In oneembodiment, the model 111 generates a score for each mortgage loanapplication to provide lenders with accurate detection of suspiciousloan fraud activity.

One embodiment of the models 110 includes a multi-component risk model113 that generates one or more risk scores and/or indicators relating tovarious types of risks associated with mortgage lending. One embodimentof the multi-component risk model 113 is based on or otherwise combinesone or more of the following risk scores: (1) a property risk score, (2)a broker risk score, (3) a borrower risk score, (4) a market risk score,and (5) an overall risk score based on several or all of the above riskscores. In one embodiment, the multi-component risk model 113 analyzesdata external to a subject loan or mortgage transaction to determine arisk of the transaction. For example, the multi-component risk model 113may evaluate recent property sales in the local real estate market toderive a property risk score that indicates a risk of early paymentdefault (90+ days delinquent in the first year) and substantial loss invalue in the subject property. Embodiments of the model 113 aredescribed in further detail below in the section entitled“Multi-Component Risk Model.”

Other models 110 may include a model 115 that generates data indicativeof early payment default (EPD). This EPD model 115 may provide a riskscore indicative of an early payment default risk by the borrower (e.g.,default in the first few months of the loan term). One suitableembodiment of such a system and model 115 is disclosed in U.S. PatentPublication No. 2009/0099959, filed on Oct. 6, 2008 and entitled“METHODS AND SYSTEMS OF PREDICTING MORTGAGE PAYMENT RISK,” which isowned by the assignee of the present application and the disclosure ofwhich is hereby incorporated by reference in its entirety. Portions ofthe '959 published application are reproduced herein. In one suchembodiment, the EPD model 115 includes a method of detecting a risk ofpayment defaults; the method includes (1) receiving mortgage dataassociated with a mortgage application, (2) determining a first scorefor the mortgage data based on one or more models that are in turn basedon data from historical mortgage transactions and historical creditinformation related to the applicant, and (3) generating data indicativeof a risk of payment default based on the first score. The creditinformation may include information related to payment history, creditscores, employment, tenure, income, and/or debt. The mortgageapplication data may include property valuation information andgeographic information. For example, the model or models 115 may beconfigured to output scores and/or other risk indicators based in parton geographic default risk information. Embodiments of the models 115are described in further detail below in the section entitled “EarlyPayment Default Model.”

Models 110 may additionally include a model 117 that generates anindicator on whether income data or stated income data is likely to beaccurate. Suitable embodiments of such model 117 are disclosed in U.S.patent application Ser. No. 11/864,606, filed on Sep. 28, 2007, which isowned by the assignee of the present application and the disclosure ofwhich is hereby incorporated by reference. Portions of the '606application are reproduced herein. Embodiments of the model 117 arefurther described below in the section entitled “Income Related FraudDetection Model.”

Providing an Overall Risk Picture

FIG. 1B illustrates an aspect of the result generated by the combinedmodel 112 that is reflective of the overall risk picture of a mortgageapplication. In a typical mortgage lending scenario, a number of risksare present, as shown for example in FIG. 1B. Individual data models maybe geared toward detecting and assessing these individual risks.However, as shown in FIG. 1B, these risks often overlap and may beinterrelated. For example, a fraud in the stated income may be a part ofa larger fraudulent scheme, and may be relevant to early default paymentrisk.

By combining individual data models through the systems and methodsdescribed herein, the risk detection and assessment system 100 is ableto provide a combined risk score and associated risk indicators thatreflect an overall risk assessment that takes into account the riskcomponents in the overall risk picture as well as the individual weightsof these risk components. The system 100 may also be able to examine theinteraction of risks and detect hidden risk patterns that otherwise maynot be easily detectable by individual models that focus on certaintypes of risks. In addition, by using a combined model approach, therisk detection and assessment system 100 may reduce the number of falsepositives in its results.

The combined score is likely to be more predictive of the loss event(e.g., fraud, default) than each individual risk score. A loan officermay thus elect to review all loan applications receiving a certainthreshold combined score (e.g., a combined score of 750 of higher on ascale of 1-999, with the higher score indicating a higher risk). Thehigher predictive rate will assist the officer in the task selecting theproper applications for further review while reducing efforts expendedon the review of false positive applications.

Risk Detection and Assessment Process

FIG. 2 is a flowchart illustrating a method of operation 200 of the riskdetection and assessment system 100. In one embodiment, the method 200begins at a block 202 in which the model generator 106 generates models(e.g., a fraud detection model) based on respective data sources. Themodels can also be generated by human programmers. In anotherembodiment, the model generator 106 receives previously generated modelsfrom an external entity. Models may be generated in a supervised orunsupervised manner. For example, parts of the fraud detection model 111may be generated based on supervised training or data analysis that isbased on data including historical transactions that have beenidentified as fraudulent or non-fraudulent. Further details ongenerating supervised models are discussed with reference to FIG. 6A.Moreover, portions of the model 111 may also include unsupervised entitymodels such as the account executive model 542, the broker model 544,the loan officer model 546, or the appraiser (or appraisal) model 548.Further details on generating unsupervised models are discussed belowwith reference to FIG. 6B.

Next at a block 204, the risk detection and assessment system 100creates one or more combined models 112 based on the individual models110. In one embodiment, the creation of the combined model 112 includesevaluating the combinability of the models 110 and their individualpredictive performances. For example, the individual models may beapplied to historical transactions with known fraudulent andnon-fraudulent transactions. The results of such applications may becompared to determine whether combining certain models results in betteroverall predictive performance. In one embodiment, differentcombinations are tested against data with known outcomes to determinewhich combinations are suitable. As further described herein, thecreation of the combined model 112 may involve additional processingsuch as feature extraction, correlation of the results of the models 110and/or of the results and data fields, and execution of supervisedlearning and/or unsupervised learning methods. Further details oncreating the combined model are provided with reference to FIGS. 3A-3B.

Proceeding to a block 206, the system 100 in one embodiment applies theindividual models 110 to data (including loan data and other non-loandata such as public records, credit data, etc.) to generate risk scores.In a block 208, generated scores from the individual models are selectedbased on the combined model 112 that is created and/or in use. In oneembodiment, more than one combined models may be created and placed inuse, and each combined model may select different generated scores fromthe individual models. In the block 208, the selected scores may also beprocessed, i.e., combined and/or mathematically manipulated into inputfeatures that will serve as input to the combined model in use. Anexample input feature may be the maximum of two or more model scores,e.g., max(model score 1, model score 2, . . . , model score n). Anotherexample input feature may be the average of several model scores. Inother embodiments, the input features may include other non-score datasuch as a loan amount and a combination of scores and non-score data. Inone embodiment, the risk indicators from the block 206 are provided tothe combined model 112 as well.

Proceeding to a block 210, the system 100 in one embodiment uses thecombined model 112 to generate a combined risk score. Risk indicatorsmay be provided by the combined model 112 as well, based on the riskindicators generated in the block 206 by the individual models. The riskindicators enable the system 100 to output explanatory, i.e., textualinformation along with the combined risk score so a user can betterunderstand the risk factors that contributed to the combined risk scoreand take appropriate remedial actions. For example, the EPD model 115may provide to the combined model 112 a risk indicator indicating a highEPD risk due to the borrower's credit history. In the final combinedrisk score output, if the EPD model score is deemed to have contributedto the combined risk score in a significant way, the same risk indicatormay be provided to the user so the user can investigate the borrower'scredit history. An example listing of risk indicators with a combinedscore will be further described below in conjunction with FIG. 4. In oneembodiment, the functions of blocks 206, 208, and 210 may be repeatedfor each loan application that is to be processed.

In one embodiment, the model generator 106 generates and/or updatesmodels 110 and their component models as new data is received or atspecified intervals such as nightly or weekly. In other embodiments,some models 110 are updated continuously and others at specifiedintervals depending on factors such as system capacity, mortgageoriginator requirements or preferences, etc. In one embodiment, somemodels are updated periodically, e.g., nightly or weekly while othermodels are only updated when new versions of the system 100 are releasedinto operation.

Model Combination Process

FIG. 3A is a flowchart illustrating in further detail block 204 of FIG.2. The method of creating a combined model in block 204 begins at ablock 302 in which the model combining module 122 receives data (e.g.,historical mortgage/loan data) for the purpose of evaluating and/ortraining one or more of the models 110. Receiving the data may includedata preprocessing. For example, the received data may be collected in acomprehensive way to cover the required fields for some or all of themodels 110. Such data may be extracted, mapped, and preprocessed to themultiple datasets as input data for each model 110. For example, in oneembodiment, different models 110 may have different definitions andformat requirements for one field such that one field may represent thesame content but in a different format from input datasets to differentmodels 110.

Certain additional preprocessing may be performed on the data set toensure good and reliable data for proper model training. This mayinclude estimating missing values, converting categorical levels intonumerical values, removing outliers (extreme values), and/orstandardizing/normalizing the feature values.

The received data (e.g., historical loan performance data), includingpayment history, default, fraud, foreclosure, and repurchase, etc., maybe linked to the loan application data such that the loan data aretagged with an outcome label or indicator. In one embodiment, the goodor non-fraudulent population is tagged with one label and the bad orfraudulent population is tagged with another. The purpose of thistagging is to provide a systematic training method to group the trainingloans due to their internal risk characteristics and implement the samejudgment on the new loans without knowing their labels a priori.

Next at a block 304, in one embodiment, the model combining module 122executes the component models 110 on the tagged transaction data andcalibrates any resulting scores. In particular, in one embodiment, thetagged data is applied to each of the models 110 and the resultingscores and other outputs are processed to generate the combined model112. In one embodiment, each model 110 runs the derived dataset receivedat the block 302 from the preprocessed data and generates a respectivemodel score. In one embodiment, each score represents a specific riskassociated with the respective model 110. The scores from the models maybe calibrated to the same dynamic range (e.g., 1-999, ranging from lowrisk to high).

Model Combination Process: Correlation Analysis

In one embodiment, after the scores are calibrated, in block 306, themodel combining module 122 determines the combinability of the modelsbased on the scores. In one embodiment, the results of the models 110are subject to a correlation analysis. The correlation of the respectivescores indicates how similar the model scores are. If the correlation oftwo model scores is high, the two model scores are very much alike andthe small discrepancy between the two scores may not make a differencein the output of the combined model 112. In an extreme case where thecorrelation of two model scores is equal to 1, there is no need tocombine the two scores together since they are identical. If one of thescores is highly correlated with another score and further analysisdemonstrates high overlap in detection, the weighting of the two similarscores in the combined model 112 may be reduced. In one embodiment,correlation analysis is based second order statistics. However, if anynon-Gaussian noise is expected to be involved, in other embodiments, thesecond order statistics may be expanded to higher order statistics byusing mutual information or entropy as a more sophisticated measurement.

Model Combination Process: Swap Analysis

In addition to or in place of correlation analysis, at block 306 themodel combining module 122 may perform swap analysis on the results andinputs of the models 110 based on application of the received data(e.g., tagged data). Swap analysis may be performed in the input spacewithout reference to the score prediction performance. A swap analysisshows the overlap and discrepancy of the review population based on thedifferent outsorting logic of the respective scores of the models 110.The proportion of overlap in the reviewed population conforms to thecorrelation analysis, where a high portion of overlap means highcorrelation between the scores. The swap analysis further measures thesimilarity between the models in terms of the prediction performance,based on the associated tags for the particular transaction (e.g.,fraud, early payment default, default, fraud plus early payment default,fraud plus default, etc.). The set of models 110 that has a small volumeof overlapping detected bad loans under the same review rate demonstratethat the models are capable of detecting different types of bad loans.Therefore, the combined scores of such models 110 are likely to scoremore accurately than the individual models.

Model Combination Process: Feature Extraction

Moving to a block 308, the model combining module 122 may extractfeatures for creating the combined model 112. Feature extraction is theprocess of designing predictive input features to build models such asthe model 112. This process may include application of a significantamount of domain knowledge in granular details of mortgage fraud andmortgage risk and be performed at least in part by a human analysis.Such domain knowledge is combined with the data-driven analysis toselect the features due to their predictiveness and robustness from bothtechnical and business points of view.

The complexity of the feature extraction is directly related to themodeling method. To achieve a same level of predictive power for acomplex classification problem, a simpler linear model typicallyrequires a more complex feature encoding. On the other hand, a morecomplex nonlinear model may have less demand on the features. In eithercase, robust features will always assist in obtaining betterperformance. Different modeling methods will typically select differentsets of features. The feature extraction for the combined model 112 maycomprise identifying (1) the interaction among the individual modelscores from the respective models 110, (2) the interaction between anindividual model score and other input fields outside the scope of therespective model (such as loan amount or borrower's years on aparticular job, in the case of mortgage fraud or default prediction),and (3) derivatives of such data. Once features are extracted, one ormore feature selection algorithms may be performed to select the bestsubset of features that are most predictive and relevant. Featureselection methods can be classified as Wrapper, Filter, and Embedded,which are methods for selecting features for the purposes of buildingpredictive models. In one embodiment, suitable feature selection methodsinclude forward/backward stepwise selection, sensitivity analysis,correlation analysis, and class separability measure. The list belowillustrates a number of example data points from which input featuresmay be selected:

-   -   fraud detection model score    -   multi-component risk model overall score        -   collateral component risk score        -   broker component risk score        -   borrower component risk score        -   market component risk score    -   early payment default risk model score    -   loan balance

As shown in block 128 of FIG. 1A, the individual model scores and otherdata points are selected and processed (i.e. mathematically manipulatedand/or combined) to create input features for the combined model at runtime. For example, as discussed above, the individual model scores mayneed to be normalized on the same scale. In one embodiment, theselection and processing performed at run time are based on the outcomeof the feature extraction step performed during the combined modelcreation process. As an example, if the feature extraction process(performed by the model combining module 122 in one embodiment) at timeof model creation selects features A and B, the input selection module128 at run time will create features A and B based on the individualmodel scores and data points for input to the combined model. Exampleinput features may include, or based on a combination of, the results ofsome of the following operations on the data points (such as thosereferenced above):

-   -   the maximum of several scores and/or non-score data points    -   the minimum of several scores and/or non-score data points    -   the average of several scores and/or non-score data points    -   the dynamic range of the several scores and/or non-score data        points (max-min)

the ratio of the dynamic range over the average

-   -   the loan balance

Thus, as a further example, after the combined model is created andplaced into operation, the “several scores” referenced above may bedetermined to be individual model score “A,” score “B,” and score “C.”Hence, at run time, when a particular application is under evaluation,the input selection block 128 may choose score “A,” score “B,” and score“C” from all the score outputs 130 from the individual models 110 asapplied to the subject loan application data and related data. The inputselection block 128 may then perform the mathematical operations (e.g.,max(Score A, Score B, Score C)) that are necessary to create the inputfeatures to be supplied into the combined model 112 to generate thefinal combined score. In some embodiments, a chosen score for creatingan input feature may be a component score or a sub-score of one of themodels 110 (e.g., the borrower component risk score of themulti-component model 113).

Model Combination Process: Model Building

Moving to a block 310, in one embodiment the model combining module 122executes a machine learning or data mining algorithm to generate acombined model that distinguishes the fraudulent from the non-fraudulenttransactions based at least in part on output of other models 110. Inparticular, after a pool of potential features has been created, acertain model structure and modeling techniques may be determinedaccording to the data itself.

As further illustrated in FIG. 3B, generating the combined model 112includes selecting modeling structure(s) (block 322) and modelingmethod(s)/technique(s) (block 324). In one embodiment, human analystsgenerate initial model structures and select the modeling methods usedin the combined model 112. The combined model 112 may be subsequentlyupdated based on new or updated data (e.g., tagged historical data) toadapt the model 112 to evolving fraud and/or risk tends.

The combined model 112 may comprise any suitable structure of individualmodels 110. For example, the combined model 112 may comprise modelstructures including one or more of a cascaded structure, adivide-and-conquer structure, and a mixed structure.

In a cascaded structure, scores of individual models 110 are ranked in aspecified order, e.g., model 1 . . . N. The first model score isinitially joined with input fields to generate an intermediate stage 1score; the second model score is again joined with the stage 1 scoretogether with input fields to generate an intermediate stage 2 score;and so on. The last model score is joined with the stage N-1 score (orall the previous scores) together with input fields to generate theoutput of the overall model 112. In each cascaded stage, the taginformation can be either the same for all the cascades or havedifferent types of risk in cascades (if the target for each stage is theresidue between the tag and the previous score starting from the secondstage, it implements a boosting methodology).

In a divide-and-conquer structure, each individual model 110 acts as anindependent module and a combination gate incorporates all the modelscores with the other interactive input fields to produce the finaloutput score.

In a mixed structure, any module of cascaded or divide-and-conquerstructures may be replaced by another network of further individualmodels. For example, in the cascaded structure, the last stage of thecascaded structure can be a divide-and-conquer structure. As a furtherexample, in the divide-and-conquer structure, one or more of the modulescan be replaced by a cascaded structure.

Once the structure of the model 112 is selected at block 322, in oneembodiment a suitable modeling technique/method is applied to generateeach individual model at block 324. Such modeling techniques may includebut are not limited to linear regression, logistic regression, neuralnetworks, support vector machines, decision trees, and theirderivatives. Suitable modeling methods may include machine learning/datamining techniques including linear regression, logistic regression,neural networks, support vector machine, decision tree, etc. Inpractice, one technique can be used in the research effort to provideinsights for another modeling technique. Thus a combination oftechniques can be used in the analysis and in the productimplementation.

As discussed above, suitable modeling methods include linear regressionand/or logical regression. Linear regression is a widely usedstatistical method that can be used to predict a target variable using alinear combination of multiple input variables. Logistic regression is ageneralized linear model applied to classification problems. It predictslog odds of a target event occurring using a linear combination ofmultiple input variables. These linear methods have the advantage ofrobustness and low computational complexity. These methods are alsowidely used to classify non-linear problems by encoding the nonlinearityinto the input features. Although the mapping from the feature space tothe output space is linear, the overall mapping from input variablesthrough features to output is nonlinear and thus such techniques areable to classify the complex nonlinear boundaries. Desirably, the linearmapping between the feature space and the output space may make thefinal score easy to interpret for the end users.

Another suitable modeling method is neural networks. Logistic regressiongenerally needs careful coding of feature values especially when complexnonlinear problems are involved. Such encoding needs good domainknowledge and in many cases involves trial-and-error efforts that couldbe time-consuming. A neural network has such nonlinearityclassification/regression embedded in the network itself and cantheoretically achieve universal approximation, meaning that it canclassify any degree of complex problems if there is no limit on the sizeof the network. However, neural networks are more vulnerable to noiseand it may be more difficult for the end users to interpret the results.In one embodiment, one suitable neural network structure is thefeed-forward, back-prop, 1 hidden layer version. Neural networks mayprovide more robust models to be used in production environments whenbased on a larger data set than would be need to provide robust modelsfrom logistic regression. Also, the number of hidden nodes in the singlehidden layer is important: too many nodes and the network will memorizethe details of the specific training set and not be able to generalizeto new data; too few nodes and the network will not be able to learn thetraining patterns very well and may not be able to perform adequately.Neural networks are often considered to be “black boxes” because oftheir intrinsic non-linearity. Hence, in embodiments where neuralnetworks are used, when higher risk scores are returned accompanyingreasons are also provided. One such option is to provide risk indicatorsin conjunction with scores generated by neural network based models, sothat the end user can more fully understand the decisions behind thehigh risk scores.

Embodiments may also include models 112 or components of the models 112that are based on support vector machines (SVMs). A SVM is a maximummargin classifier that involves solving a quadratic programming problemin the dual space. Since the margin is maximized, it will usually leadto low generalization error. One of the desirable features of SVMs isthat such a model can cure the “curse of dimensionality” by implicitmapping of the input vectors into high-dimensional vectors through theuse of kernel functions in the input space. A SVM can be a linearclassifier to solve the nonlinear problem. Since all the nonlinearboundaries in the input space can be linear boundaries in thehigh-dimensional functional space, a linear classification in thefunctional space provides the nonlinear classification in the inputspace. It is to be recognized that such models may require very largevolume of independent data when the input dimension is high.

Embodiments may also include models 112 or components of the models 112that are based on decision trees. Decision trees are generated using amachine learning algorithm that uses a tree-like graph to predict anoutcome. Learning is accomplished by partitioning the source set intosubsets using an attribute value in a recursive manner. This recursivepartitioning is finished when pre-selected stopping criteria are met. Adecision tree is initially designed to solve classification problemsusing categorical variables. It can also be extended to solve regressionproblem as well using regression trees. The Classification andRegression Tree (CART) methodology is one suitable approach to decisiontree modeling. Depending on the tree structure, the compromise betweengranular classification, (which may have extremely good detectionperformance) and generalization, presents a challenge for the decisiontree Like logistic regression, results from decisions trees are easy tointerpret for the end users.

Once the modeling structure and the modeling method are determined, themodel 112 is trained based on the historical data adaptively. Theparameters of the model “learn” or automatically adjust to thebehavioral patterns in the historical data and then generalize thesepatterns for detection purposes. When a new loan is scored, the model112 will generate a combined score to evaluate its risk based on what ithas learned in its training history. The modeling structure and modelingtechniques for generating the model 112 may be adjusted in the trainingprocess recursively.

The listing of modeling structures and techniques provided herein arenot exhaustive. Those skilled in art will appreciate that otherpredictive modeling structures and techniques may be used in variousembodiments. Example predictive modeling structures and techniques mayinclude Genetic Algorithms, Hidden Markov Models, Self Organizing Maps,Dynamic Bayesian Networks, Fuzzy Logic, and Time Series Analysis. Inaddition, in one embodiment, a combination of the aforementionedmodeling techniques and other suitable modeling techniques may be usedto in the combined model 112.

Combined Model Performance Evaluation

The performance of the combined model 112 may be evaluated in itspredictive power and generalization prior to release to production. Forexample, in one embodiment, at a block 326, the performance of acombined model 112 is evaluated on both the training dataset and thetesting dataset, where the testing dataset is not used during the modeldevelopment. The difference between the performance in the training dataand the testing data demonstrates how robust the model is and how muchthe model is able to generalize to other datasets. The closer the twoperformances are, the more robust the model is.

A number of suitable metrics may be used to evaluate the predictiveability of the combined model 112. One embodiment uses a commonly usedmetric called the Receiver Operating Characteristic (ROC) curve. ROCdemonstrates how many bad loans are detected by the model under acertain review volume by showing the adaptive boundary change usingdifferent score thresholds. This metric is independent of the intrinsicfraud (or bad) rate in the data and thus is a good metric to compareacross differing data sets. In one embodiment, the derivative of ROC isalso used to demonstrate how much total value in the bad loans isdetected by the model under a certain review volume. In one embodiment,the ROC charts are plotted for the combined model 112 and all theindividual model scores alone, so that improvement in performance can beeasily seen at all review rates. In one embodiment, performanceimprovement is measured using one or more of the following metrics:false positive rate, fraud amount detection rate (the total dollaramount of fraudulent loans detected), and count detection rate (thetotal instances of fraudulent loans detected).

Finally, at a block 328, the generated combined model 112 may beadjusted and/or retrained as needed. For example, the combined model maybe adjusted to use a different modeling technique, based on theevaluation of the model performance. The adjusted combined model 112 maythen be re-trained. In another example, the combined model may bere-trained using updated and/or expanded data (e.g., historicaltransaction data) as they become available.

Scoring Process Using the Combined Model

FIG. 3C is a flowchart illustrating an example of a method using thecombined model 112 to generate a combined risk score as indicated inblock 210 of FIG. 2. The method begins at a block 342 in which thesystem receives data from which a combined score is to be calculated,including data associated with a particular mortgage transaction forprocessing as well as other data external to the transaction such ascredit data, public record data, etc. The mortgage transaction data maycomprise data of a mortgage application, an issued mortgage, or anyother suitable loan or application. Data may be received from the loanorigination system 116, the storage 104, and/or other data sources.

Next at a block 344, the system 100 (e.g., one or more processors of acomputer system associated with the system 100) applies the individualmodels 110 to the received data to generate risk scores from the models.At a block 346, the generated scores are selected, depending on thecombined model that is created or in use. In one embodiment, more thanone combined model may be created, and each combined model may select adifferent mix of scores from the individual models. The selected scoresand potentially other input data (e.g., a loan balance amount) may alsobe processed, i.e., combined and/or mathematically manipulated intoinput features that will serve as input to the combined model that is inuse. At a block 348, the system 100 may use the combined model with theinput features to generate the combined score. Moving to a block 350,the system 100 may optionally generate a report providing combined scoreand associated risk indicators. In one embodiment, the combined model112 may selectively output the risk indicators generated by theindividual models 110, e.g., based on the weighting or a model result inthe combined model 112. For example, risk indicators associated withselected individual model scores used are provided as output.

FIG. 4 is an example report that is generated by the risk detection andassessment system 100 using a combined model 112. As shown, the examplereport includes a combined score 402 and a plurality of risk indicators404, 406, 408, 410, and 412. In this example, the risk indicators aregrouped by category. For example, risk indicators 404 are related toincome/employment of the loan applicant and risk indicators 406 arerelated to the subject property of the mortgage. As discussed above,besides generating a combined risk score 402, the risk detection andassessment system 100 may also output these risk indicators to alert theend users as to the individual risk factors or components thatcontributed to the combined risk score. The example report 400 in FIG. 4shows that the subject mortgage transaction has been classified as “highrisk,” and a number of specific risks are identified by the riskindicators with corresponding recommendations, so an end user can takecorrective actions in view of the risks. In addition, as shown, eachrisk indicator may include a classification of “high risk,” “moderaterisk,” or “low risk.” In one embodiment, the classification isreflective of the contributing weight of the identified risk to thecombined risk score 402.

Individual Models

Example models that may be included in the individual models 110 arefurther described in the following sections.

Fraud Detection Model

As discussed above, the models 110 in one embodiment include thehistorical transaction based fraud detection model 111, which is derivedfrom mortgage loan data, borrower data, financial data, and otheradditional data. This may include data related to historicaltransactions. The model is built from statistical information that isstored according to groups of individuals that form clusters. In onesuch embodiment, fraud is identified with reference to deviation fromidentified clusters. In one embodiment, in addition to data associatedwith the mortgage applicant, embodiments of mortgage fraud detectionsystems may use data that is stored in association with one or moreentities associated with the processing of the mortgage transaction suchas brokers, appraisers, or other parties to mortgage transactions. Theentities may be real persons or may refer to business associations,e.g., a particular appraiser, or an appraisal firm. Fraud generallyrefers to any material misrepresentation associated with a loanapplication and may include any misrepresentation which leads to ahigher probability for the resulting loan to default, becomeun-sellable, or require discount in the secondary market.

FIG. 5A is a functional block diagram further illustrating an example offraud detection system including historical transaction based frauddetection model or models 111. The model 111 may include an originationsystem interface 522 providing mortgage application data to a datapreprocessing module 524. The origination system interface 522 mayreceive data from, for example, the mortgage origination system 116 ofFIG. 1. In other embodiments, the origination system interface 522 maybe configured to receive data associated with funded mortgages and maybe configured to interface with suitable systems other than, or inaddition to, mortgage origination systems. For example, in oneembodiment, the system interface 522 may be configured to receive “bidtapes” or other collections of data associated with funded mortgages foruse in evaluating fraud associated with a portfolio of funded loans. Inone embodiment the origination system interface 522 comprises a computernetwork that communicates with the origination system 116 to receiveapplications in real time or in batches. In one embodiment, theorigination system interface 522 receives batches of applications via adata storage medium.

Fraud Detection Model: Pre-Processing of Loan Application Data

The origination system interface 522 provides application data to thedata preprocessing module 524 which formats application data into dataformats used internally in the model 111. For example, the originationsystem interface 522 may also provide data from additional sources suchas credit bureaus that may be in different formats for conversion by thedata preprocessing module 524 into the internal data formats of themodel 111. The origination system interface 522 and preprocessing module524 also allow at least portions of a particular embodiment of the model111 to be used to detect fraud in different types of credit applicationsand for different loan originators that have varying data and dataformats. A table listing examples of mortgage application data that maybe used in various embodiments can be found in the previouslyincorporated U.S. Pat. No. 7,587,348 entitled “SYSTEM AND METHOD OFDETECTING MORTGAGE RELATED FRAUD.”

Various features described with respect to the system illustrated inFIG. 5A for receiving data, preprocessing data, and processing scoresoutput by the system may be used with any of the models 110 illustratedin FIG. 1. Moreover, any of the data described in Table 1 of the '348patent may be used with any other of the models 110, which may also usedata additional to that illustrated in Table 1 of the '348 patent.

The preprocessing module 524 may be configured to identify missing datavalues and provide data for those missing values to improve furtherprocessing. For example, the preprocessing module 524 may generateapplication data to fill missing data fields using one or more rules.Different rules may be used depending on the loan data supplier, on theparticular data field, and/or on the distribution of data for aparticular field. For example, for categorical fields, the most frequentvalue found in historical applications may be used. For numericalfields, the mean or median value of historical applications may be used.In addition, other values may be selected such as a value that isassociated with the highest risk of fraud (e.g., assume the worst) or avalue that is associated with the lowest risk of fraud (e.g., assume thebest). In one embodiment, a sentinel value, e.g., a specific value thatis indicative of a missing value to one or more fraud models may be used(allowing the fact that particular data is missing to be associated withfraud).

The preprocessing module 524 may also be configured to identifyerroneous data or missing data. In one embodiment, the preprocessingmodule 524 extrapolates missing data based on data from similarapplications, similar applicants, or using default data values. Thepreprocessing module 524 may perform data quality analysis such as oneor more of critical error detection, anomaly detection, and data entryerror detection. In one embodiment, applications failing one or more ofthese quality analyses may be logged to a data error log database 526.

In critical error detection, the preprocessing module 524 identifiesapplications that are missing data that the absence of which is likelyto confound further processing. Such missing data may include, forexample, appraisal value, borrower credit score, or loan amount. In oneembodiment, no further processing is performed and a log or error entryis stored to the database 526 and/or provided to the loan originationsystem 116.

In anomaly detection, the preprocessing module 524 identifies continuousapplication data values that may be indicative of data entry error or ofmaterial misrepresentations. For example, high loan or appraisal amounts(e.g., above a threshold value) may be indicative of data entry error orfraud. Other anomalous data may include income or age data that isoutside selected ranges. In one embodiment, such anomalous data islogged and the log provided to the origination system 116. In oneembodiment, the model 111 processes applications with anomalous data.The presence of anomalous data may be logged to the database 526 and/orincluded in a score output or report for the corresponding application.

In data entry detection, the preprocessing module 524 identifiesnon-continuous data such as categories or coded data that appear to havedata entry errors. For example, telephone numbers or zip codes that havetoo many or too few digits, incomplete social security numbers, tollfree numbers as home or work numbers, or other category data that failsto conform to input specifications may be logged. The presence ofanomalous data may be logged to the database 526 and/or included in ascore output or report for the corresponding application.

In one embodiment, the preprocessing module 524 queries an input historydatabase 528 to determine if the application data is indicative of aduplicate application. A duplicate may indicate either resubmission ofthe same application fraudulently or erroneously. Duplicates may belogged. In one embodiment, no further processing of duplicates isperformed. In other embodiments, processing of duplicates continues andmay be noted in the final report or score. If no duplicate is found, theapplication data is stored to the input history database 524 to identifyfuture duplicates.

Fraud Detection Model: Entity Based Loan Models

The data preprocessing module 524 provides application data to one ormore models for fraud scoring and processing. In one embodiment,application data is provided to one or more loan models 532 thatgenerate data indicative of fraud based on application and applicantdata. The data indicative of fraud generated by the loan models 532 maybe provided to an integrator 536 that combines scores from one or moremodels into a final score. The data preprocessing module 524 may alsoprovide application data to one or more entity models 540 that areconfigured to identify fraud based on data associated with entitiesinvolved in the processing of the application. Entity models may includemodels of data associated with loan brokers, loan officers or otherentities involved in a loan application. More examples of such entitymodels 540 are illustrated with reference to FIG. 5B. Each of the entitymodels may output data to an entity scoring module 550 that isconfigured to provide a score and/or one or more risk indicatorsassociated with the application data. The term “risk indicator” refersto data values identified with respect to one or more data fields thatmay be indicative of fraud. The entity scoring module 550 may providescores associated with one or more risk indicators associated with theparticular entity or application. For example, appraisal value incombination with zip code may be a risk indicator associated with anappraiser model. In one embodiment, the entity scoring module 550provides scores and indicators to the integrator 536 to generate acombined fraud score and/or set of risk indicators.

In one embodiment, the selection of risk indicators are based oncriteria such as domain knowledge, and/or correlation coefficientsbetween entity scores and fraud rate, if entity fraud rate is available.Correlation coefficient r_(i) between entity score s^(i) for riskindicator i and entity fraud rate f is defined as

$r_{i} = \frac{\sum\limits_{j = 1}^{N}{\left( {s_{j}^{i} - \overset{\_}{s}} \right)\left( {f_{j} - \overset{\_}{f}} \right)}}{\left( {N - 1} \right){{SD}\left( s^{i} \right)}{{SD}(f)}}$

where s^(i) _(j) is the score for entity j on risk indicator i; andf_(j) is the fraud rate for entity j. If r_(i) is larger than apre-defined threshold, then the risk indicator i is selected.

In one embodiment, the entity scoring model 550 combines each of therisk indicator scores for a particular entity using a weighted averageor other suitable combining calculation to generate an overall entityscore. In addition, the risk indicators having higher scores may also beidentified and provided to the integrator 536.

In one embodiment, the combined score for a particular entity may bedetermined using one or more of the following models:

-   -   An equal weight average:

${s_{c} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}s^{i}}}},$where N is the number of risk indicators;

-   -   A weighted average:

${s_{c} = {\sum\limits_{i = 1}^{N}{s^{i}\alpha^{i}}}},$where N is the number of risk indicators and α^(i) is estimated based onhow predictive risk indicator i is on individual loan level; a

-   -   A competitive committee:

${s_{c} = {\frac{1}{M}{\sum\limits_{i = 1}^{M}s^{i}}}},$where s^(i) ε (set of largest M risk indicator scores).

If entity fraud rate or entity performance data (EPD) rate is available,the fraud/EPD rate may be incorporated with entity committee score togenerate the combined entity score. The entity score s_(E) may becalculated using one of the following equations:S _(E) =S _(C), if relative entity fraud/EPD rate≦1;S _(E) =S _(D)+min(α*max(absoluteFraudRate, absoluteEPDRate),0.99)(998−S_(D)) if relative entity fraud/EPD rate>1 and S _(c) <S _(D);S _(E) =S _(C)+min(α*max(absoluteFraudRate, absoluteEPDRate),0.99)(998−S_(C)) if relative entity fraud/EPD rate>1 and S _(C) ≧S _(D);where α=b*tan h(α*(max(relativeFraudRate, relativeEPDRate)−1))

The preprocessing module 524 may also provide application data to arisky file processing module 556. In addition to application data, therisky file processing module 556 is configured to receive files from arisky files database 554. “Risky” files include portions of applicationsthat are known to be fraudulent. It has been found that fraudulentapplications are often resubmitted with only insubstantial changes inapplication data. The risky file processing module 556 compares eachapplication to the risky files database 554 and flags applications thatappear to be resubmissions of fraudulent applications. In oneembodiment, risky file data is provided to the integrator 536 forintegration into a combined fraud score or report.

The integrator 536 applies weights and/or processing rules to generateone or more scores and risk indicators based on the data indicative offraud provided by one or more of the loan models 532, the entity models540 and entity scoring modules 560, and the risky file processing module556. In one embodiment, the risk indicator 536 generates a single scoreindicative of fraud along with one or more risk indicators relevant forthe particular application. Additional scores may also be provided withreference to each of the risk indicators. The integrator 536 may providethis data to a scores and risk indicators module 560 that logs thescores to an output history database 560. In one embodiment, the scoresand risk indicators module 560 identifies applications for furtherreview by the risk manager 518 of FIG. 1. Scores may be real or integervalues. In one embodiment, scores are numbers in the range of 1-999. Inone embodiment, thresholds are applied to one or more categories tosegment scores into high and low risk categories. In one embodiment,thresholds are applied to identify applications for review by the riskmanager 118. In one embodiment, risk indicators are represented as codesthat are indicative of certain data fields or certain values for datafields. Risk indicators may provide information on the types of fraudand recommended actions. For example, risk indicators might include acredit score inconsistent with income, high risk geographic area, etc.Risk indicators may also be indicative of entity historicaltransactions, e.g., a broker trend that is indicative of fraud.

In one embodiment, the model generator 506 receives application data,entity data, and data on fraudulent and non-fraudulent applications andgenerates and updates models such as the entity models 540 eitherperiodically or as new data is received.

FIG. 5B is a functional block diagram illustrating examples of theentity models 540 in the fraud detection model 111. It has been foundthat fraud detection performance can be increased by including modelsthat operate on entities associated with a mortgage transaction that arein addition to the mortgage applicant. Scores for a number of differenttypes of entities are calculated based on historical transaction data.The entity models may include one or more of an account executive model542, a broker model 544, a loan officer model 546, and an appraiser (orappraisal) model 548. Embodiments may also include other entitiesassociated with a transaction such as the lender. For example, in oneembodiment, an unsupervised model, e.g., a clustering model such ask-means, is applied to risk indicators for historical transactions foreach entity. A score for each risk indicator, for each entity, iscalculated based on the relation of the particular entity to theclusters across the data set for the particular risk indicator.

By way of a simple example, for a risk indicator that is a single value,e.g., loan value for a broker, the difference between the loan value ofeach loan of the broker and the mean (assuming a simple Gaussiandistribution of loan values) divided by the standard deviation of theloan values over the entire set of historical loans for all brokersmight be used as a risk indicator for that risk indicator score.Embodiments that include more sophisticated clustering algorithms suchas k-means may be used along with multi-dimensional risk indicators toprovide for more powerful entity scores.

The corresponding entity scoring module 550 for each entity (e.g.,account executive scoring module 552, broker scoring module 554, loanofficer scoring module 556, and appraisal scoring module 558) may createa weighted average of the scores of a particular entity over a range ofrisk indicators that are relevant to a particular transaction.

Fraud Detection Model: Supervised v. Unsupervised Models

FIG. 5C is a functional block diagram illustrating an example of theloan models 532 in the historical transaction based fraud detectionmodel 111. In one embodiment, the loan models 532 may include one ormore supervised models 570 and high risk rules models 572. Supervisedmodels 170 are models that are generated based on training or dataanalysis that is based on historical transactions or applications thathave been identified as fraudulent or non-fraudulent. Examples ofimplementations of supervised models 570 include scorecards, naïveBayesian, decision trees, logistic regression, and neural networks.Particular embodiments may include one or more such supervised models570.

In addition their use with the loan models 532, such models and modelingmethods and systems may also be used with respect to any of the models110 and/or as part of the combining model 112.

The high risk rules models 572 may include expert systems, decisiontrees, and/or classification and regression tree (CART) models. The highrisk rules models 572 may include rules or trees that identifyparticular data patterns that are indicative of fraud. In oneembodiment, the high risk rules models 572 are used to generate scoresand/or risk indicators.

In one embodiment, the rules, including selected data fields andcondition parameters, are developed using the historical data used todevelop the loan model 570. A set of high risk rule models 572 may beselected to include rules that have low firing rate and high hit rate.In one embodiment, when a rule i is fired, it outputs a score: S_(rule)^(i). The score represents the fraud risk associated to the rule. Thescore may be a function ofS _(rule) ^(i) =f(hitRateOfRule^(i), firingRateofRule^(i),scoreDistributionOfLoanAppModel),and S _(rule)=max(S _(rule) ¹ . . . S _(rule) ^(N)).

In one embodiment, the loan models 570 and 572 are updated when newversions of the model 111 are released into operation. In anotherembodiment, the supervised models 170 and the high risk rules models 572are updated automatically. In addition, the supervised models 570 andthe high risk rules models 572 may also be updated such as when new ormodified data features or other model parameters are received.

Fraud Detection Model: Model Generation Processes

FIG. 6A is a flowchart illustrating an example of generating the loanmodels 132 in the model 111. The flowchart illustrates a method 600 ofperforming the block 202 of FIG. 2. Similar techniques may be applied toany of the models 110. Supervised learning algorithms identify arelationship between input features and target variables based ontraining data. In one embodiment, the target variables comprise theprobability of fraud. Generally, the models used may depend on the sizeof the data and how complex a problem is. For example, if the fraudulentexemplars in historical data are less than about 5000 in number, smallerand simpler models may be used, so a robust model parameter estimationcan be supported by the data size. The method 600 begins at a block 602in which the model generator 106 receives historical mortgage data. Themodel generator 106 may extract and convert client historical dataaccording to internal development data specifications, perform dataanalysis to determine data quality and availability, and rectifyanomalies, such as missing data, invalid data, or possible data entryerrors similar to that described above with reference to preprocessingmodule 524 of FIG. 5A.

In addition, the model generator 106 may perform feature extractionincluding identifying predictive input variables for fraud detectionmodels. The model generator 106 may use domain knowledge andmathematical equations applied to single or combined raw input datafields to identify predictive features. Raw data fields may be combinedand transformed into discriminative features. Feature extraction may beperformed based on the types of models for which the features are to beused. For example, linear models such as logistic regression and linearregression, work best when the relationships between input features andthe target are linear. If the relationship is non-linear, propertransformation functions may be applied to convert such data to a linearfunction. In one embodiment, the model generator 106 selects featuresfrom a library of features for use in particular models. The selectionof features may be determined by availability of data fields, and theusefulness of a feature for the particular data set and problem.Embodiments may use techniques such as filter and wrapper approaches,including information theory, stepwise regression, sensitivity analysis,data mining, or other data driven techniques for feature selection.

In one embodiment, the model generator 106 may segment the data intosubsets to better model input data. For example, if subsets of a dataset are identified with significantly distinct behavior, special modelsdesigned especially for these subsets normally outperform a generalfit-all model. In one embodiment, a prior knowledge of data can be usedto segment the data for generation of models. For example, in oneembodiment, data is segregated geographically so that, for example,regional differences in home prices and lending practices do notconfound fraud detection. In other embodiments, data driven techniques,e.g., unsupervised techniques such as clustering, are used to identifydata segments that may benefit from a separate supervised model.

Proceeding to a block 604, the model generator 106 identifies a portionof the applications in the received application data (or segment of thatdata) that were fraudulent. In one embodiment, the origination systeminterface 522 provides this labeling. Moving to a block 606, the modelgenerator 106 identifies a portion of the applications that werenon-fraudulent. Next at a block 608, the model generator 106 generates amodel such as the supervised model 570 using a supervised learningalgorithm to generate a model that distinguishes the fraudulent from thenon-fraudulent transactions. In one embodiment, CART or other suitablemodel generation algorithms are applied to at least a portion of thedata to generate the high risk rules models 572.

In one embodiment, historical data is split into multiple non-overlappeddata sets. These multiple data sets are used for model generation andperformance evaluation. For example, to train a neural network model,the data may be split into three sets, training set 1, training set 2,and validation. The training set 1 is used to train the neural network.The training set 2 is used during training to ensure the learningconverge properly and to reduce overfitting to the training set 1. Thevalidation set is used to evaluate the trained model performance.Supervised models may include one or more of scorecards, naïve Bayesian,decision trees, logistic regression, and neural networks. Suchtechniques may also be applied to generate at least a portion of thecombining model 112.

FIG. 6B is a flowchart illustrating an example of a method 650 ofperforming the block 202 of FIG. 2. The illustrated example processgenerates entity models 540 in the historical transaction based frauddetection model 111. The method 650 begins at a block 662 in which themodel generator 106 receives historical mortgage applications and datarelated to mortgage processing related entities such as an accountexecutive, a broker, a loan officer, or an appraiser. Moving to a block664, the model generator 106 selects risk indicators comprising one ormore of the input data fields. In one embodiment, expert input is usedto select the risk indicators for each type of entity to be modeled. Inother embodiments, data driven techniques such as data mining are usedto identify risk indicators.

Next at a block 668, the model generator 106 performs an unsupervisedclustering algorithm such as k-means for each risk indicator for eachtype of entity. Moving to a block 680, the model generator 106calculates scores for risk indicators for each received historical loanbased on the data distance from data clusters identified by theclustering algorithm. For example, in a simple one cluster model wherethe data is distributed in a normal or Gaussian distribution, thedistance may be a distance from the mean value. The distance/score maybe adjusted based on the distribution of data for the risk indicator,e.g., based on the standard deviation in a simple normal distribution.Moving to a block 672, scores for each risk indicator and each entityare calculated based on model, such as a weighted average of each of theapplications associated with each entity. Other embodiments may useother models.

Fraud Detection Model: Model Score Calculation

FIG. 7 is a flowchart illustrating an embodiment of a method ofgenerating a model score using the fraud model 111. The method 700begins at a block 702 in which the origination system interface 522receives loan application data. Next at a block 704, the datapreprocessing module 524 preprocesses the application data as discussedabove with reference to FIG. 5A.

Moving to a block 706, the application data is applied to the supervisedloan models 570 which provide a score indicative of the relativelikelihood or probability of fraud to the integrator 536. In oneembodiment, the supervised loan models 570 may also provide riskindicators. Next at a block 808, the high risk rules model 572 isapplied to the application to generate one or more risk indicators,and/or additional scores indicative of fraud. Moving to a block 710, theapplication data is applied to one or more of the entity models 540 togenerate additional scores and risk indicators associated with thecorresponding entities of the models 540 associated with thetransaction.

Next at a block 712, the integrator 536 calculates a weighted score andrisk indicators based on scores and risk indicators from the supervisedloan model 570, the high risk rules model 572, and scores of entitymodels 540. In one embodiment, the integrator 536 includes an additionalmodel, e.g., a trained supervised model, that combines the variousscores, weights, and risk factors provided by the models 570, 572, and540.

Moving to a block 714, the scores and risk indicators module 560 and thescore review report module 562 generate a report providing a weightedscore along with one or more selected risk indicators. The selected riskindicators may include explanations of potential types of frauds andrecommendations for action.

Multi-Component Risk Model

FIG. 8 is a block diagram illustrating an embodiment of themulti-component risk model 113 for evaluating risks associated withmortgage lending. As shown, the multi-component risk model 113 mayinclude several components, including a property/collateral component802, a broker component 804, a borrower component 806, and a marketcomponent 808. The multi-component risk model 113 may also take, asinput, data from a number of data sources, including lender contributeddata 812 (e.g., mortgage data reported by lenders), third party data 814(e.g., credit data, financial data, employment data), public recordsdata 816 (e.g., property records), and other data 818.

In one embodiment, the property/collateral component 802 is configuredto assess a risk of the subject property/collateral (e.g., an earlypayment default (90+ days delinquent in the first year)). Other examplerisks such the risk of a default over a longer time period may beassessed as well. The property/collateral component 802 may be based onan evaluation of public records (e.g., assessor and recorder records)and property characteristic data (e.g., size of property, improvements,location, etc.). Beyond evaluating data relating to the subjectproperty/collateral, the property/collateral component 802 may alsoevaluate data at a neighborhood level, assessing pricing dynamics,foreclosure dynamics, buy and sell trends, and/or valuation trends ofnearby properties. The property/collateral component 802 may also baseits risk score output on an automated value model (AVM) and/or a homeprice index (HPI) model. In one embodiment, based on a combination ofthese evaluations, the property/collateral component 802 is configuredto render a score for a given property involved in a mortgageapplication. In one embodiment, the property/collateral risk scoreassesses a risk associated with over-valuation and fraudulent valuationof the subject property/collateral. In other embodiments, theproperty/collateral risk score may be used in evaluating mortgageapplications and/or funded loans by an investment bank or as part of duediligence of a loan portfolio.

The broker component 804 may provide a risk score that assesses a riskassociated with a particular broker. In one embodiment, at least aportion of the property/collateral model 802 is applied to loan datacontributed by lenders (contributed data 812). Since the contributeddata 812 identify the brokers associated with the loans, risks for theindividual brokers may be calculated by aggregating theproperty/collateral risk scores of the properties associated with loansfrom the individual brokers. In one embodiment, the broker risk scorepredicts the risk of early default and/or fraud.

Likewise, the borrower component 806 may provide a risk score thatassesses a risk associated with a particular borrower. In oneembodiment, the borrower component 806 searches public records (e.g.,assessor and recorder records) data 816 to find previous addressesassociated with a borrower in question, and at least a portion of theproperty/collateral model 802 is then applied to properties associatedwith these previous addresses. In addition, the borrower component 806may also evaluate the third party data 814 including the borrower'scredit data, and any other proprietary data and/or public record dataassociated with the borrower. The borrower's risk score that isgenerated as a result of these evaluations predicts the default riskassociated with the particular borrower.

Finally, the market component 808 may provide a risk score on the realproperty market in which the subject property is located. In oneembodiment, the market component 808 applies at least a portion of theproperty/collateral model component 802 to properties within a specificgeographic area (e.g., properties in the same ZIP code). In addition themarket score 808 may also evaluate public records data, any otherproprietary data sources, and potentially derivate works of these datasources.

In one embodiment, the risk scores from the four components are combinedto provide an overall risk score 810. In one embodiment, one or more ofthese five risk scores (the property/collateral risk score, the brokerrisk score, the broker risk score, the market risk score, and theoverall risk score) are provided as input to the combined model 112 togenerate a combined score in accordance with the embodiments shown inFIGS. 1A-4. In addition, as with other individual models 110, themulti-component risk model 113 may provide risk indicators to thecombined model so that specific risks may be displayed with the combinedscore. For example, risk indicators related to a high market risk scoremay be provided to the combined model 112 so that a user may be alertedto the fact that one factor contributing to a high combined risk scoreis that the local property market is at a high risk of price decline.

Early Payment Default Model

As referenced, the early payment default (EPD) model 115 may be used tocreate the combined model 112 and the output of the EPD model 115 (e.g.,an EPD risk score ranging from 1-999) may be selected and processed intoan input feature to the combined model 112. In various embodiments, theEPD model 115 employs statistical pattern recognition to generate ascore designed to assess the risk of early payment default in mortgageapplications and loans (e.g., default within the first few months ofrepayment period). In one embodiment, the EPD model 115 finds earlypayment default risk based on historical patterns of both performing andnon-performing mortgage loans from the a database of historical loans.In one embodiment, the EPD model operates in a similar fashion as thefraud detection model 111. For example, a process similar to that shownin FIG. 7 can be employed in the EPD model 115, wherein steps 706, 708,and 710 would be customized and directed to detecting early paymentdefault. As a further example, embodiments of the EPD model can begenerated using a supervised learning model as described above inconjunction with FIG. 5C (step 570), using example loans with andwithout early payment default to effectively learn how to generate ascore that represents the likelihood of a loan defaulting during aparticular portion of the life of the loan.

Additional risk factors can be included in the supervisory models usedfor EPD detection native to fraud detection. Those factors can broadlybe defined as: borrower's risk, geographic risk, borrower'saffordability, and property valuation risk. Borrower's risk can includeinformation such as a credit score, payment history, employmentinformation, tenure in current employment position, debt, income,occupancy, etc. This information can be used to evaluate the riskfactors associated with the borrower. For example, if the buyer has arisky credit score or employment, then he or she may be a higher riskfor EPD and the EPD model 115 can take this into account. Propertyappraisal information and the geographic location of the property canalso be used to determine the EPD risk. For example, the property may beovervalued relative to other properties in the area and/or the area mayhave a high rate of defaults. Thus, such information can be used in theEPD model 115 to determine a geographic risk factor and/or a propertyvaluation risk factor. These risk factors may be output by the EPD model115 as risk indicators, so that risk factors that provide significantcontributions can be identified in a user display/report such as the oneshown in FIG. 4.

FIG. 9 is a functional block diagram illustrating an example of the EPDmodel 115. As can be seen, the configuration of EPD model 115 is similarto that of the fraud detection model 111 as shown in FIG. 5A, with EPDmodels 932 replacing the loan models 532 and the introduction of creditdata 925.

As shown, an origination system interface 922 provides mortgageapplication data to a data preprocessing module 924. The interface 922may receive data from the mortgage origination system 116 as shown inFIG. 1A. A credit data system 925 can be configured to receive applicantcredit data from one or more credit bureaus or from the lender such asvia the loan origination system interface 922 to store and provide thatdata to the EPD model.

The origination system interface 922 can provide application data to thedata preprocessing module 924, which formats application data into dataformats used internally by the model 115. The data preprocessing module924 can provide application data to one or more models for EPD riskscoring and processing. In one embodiment, application data is providedto one or more EPD models 932 that generate data indicative of EPD riskbased on application and applicant data. The data indicative of EPD riskgenerated by the EPD models 932 can be provided to an integrator 936that combines scores from one or more models into a final score. Thedata preprocessing module 924 can also provide application data to oneor more entity models 940 that are configured to identify EPD risk basedon data associated with entities involved in the processing of theapplication. Entity models can include models of data associated withloan brokers, loan officers or other entities involved in a loanapplication. Additional examples of such entity models 940 areillustrated with reference to FIG. 5B. Each of the entity models canoutput data to an entity scoring module 950 that is configured toprovide a score and/or one or more risk indicators associated with theapplication data.

Optionally, the entity scoring module 950 can provide scores associatedwith one or more risk indicators associated with the particular entityor application. For example, appraisal value in combination with zipcode can be a risk indicator associated with an EPD model. In oneembodiment, the entity scoring module 950 provides scores and indicatorsto the integrator 936 to generate a combined EPD risk score and/or setof risk indicators.

The integrator 936 can be configured to apply weights and/or processingrules to generate one or more scores and risk indicators based on thedata indicative of EPD risk provided by one or more of the EPD models932, the entity models 940 and entity scoring modules 960. In oneembodiment, the risk indicator 936 can generate a single scoreindicative of EPD risk along with one or more risk indicators relevantfor the particular application. Additional scores can also be providedwith reference to each of the risk indicators. The integrator 936 canprovide this data to a scores and risk indicators module 960. In oneembodiment, scores are numbers in the range of 1-999. As described abovewith reference to FIG. 1A, the scores and risk indicators are providedto the combined model 112 for calculation of the combined risk score.The risk indicators are presented to the user, for example, via anexample interface shown in FIG. 4, to denote risks factors that providesignificant contribution to the combined score. In one embodiment, riskindicators are represented as codes that are indicative of certain datafields or certain values for data fields. Risk indicators can provideinformation on the types of EPD risk and recommended actions. Forexample, risk indicators may include a credit score that falls withinhigh % of default ranges, a high risk of default geographic area, etc.Risk indicators can also be indicative of entity historicaltransactions, e.g., a CLTV percentage that is indicative of EPD risk.

As previously stated, additional description of the configuration setforth in FIG. 9 and other details of the EPD model 115 are disclosed inthe above referenced U.S. Patent Publication No. 2009/0099959, filed onOct. 6, 2008 and entitled “METHODS AND SYSTEMS OF PREDICTING MORTGAGEPAYMENT RISK.”

Income Related Fraud Detection Model

FIG. 10 is a flowchart illustrating embodiment of the model 117 fordetecting fraud based on applicant income for use with other models asin an embodiment illustrated in FIGS. 1A-4. The method begins at a block1010 in which the model 117 receives stated income information submittedby the applicant and pertaining to an employment income of theapplicant. Next at a block 1012, the model 117 automatically obtainsadditional information from a source other than the applicant. Theadditional information is related to the stated income information andis obtained using information supplied by the applicant. In oneembodiment, the additional information comprises typical income levelsin at least one neighborhood of residence of the applicant. In oneembodiment, the model 117 automatically generates one or more links to asearch service, wherein the links correspond to search terms related tothe applicant's stated income.

Moving to a block 1014, the model 117 programmatically uses theadditional information to generate a validity measure reflective of alikelihood that the stated income information is accurate. In oneembodiment, the model 117 automatically uses employment informationsupplied by the applicant in a free-form format to automatically selectan employment category of the applicant, and uses the selectedemployment category to assess the stated income information. In oneembodiment, the model 117 generates an estimated income level of theapplicant based, at least in part, on employment and residenceinformation of the applicant, and compares the estimated income level tothe stated income information. In one embodiment, the model 117automatically gathers information indicative of incomes of others havingsimilar employment to that of the applicant. In one such embodiment, themodel 117 optionally programmatically generates a report which includesthe validity measure and information regarding incomes of others havingsimilar employment.

In one embodiment, the model 117 uses information supplied by theapplicant to automatically identify at least one previous residenceaddress of the applicant, and to obtain information regarding a typicalincome level in a neighborhood corresponding to said previous residenceaddress. In one embodiment, the at least one previous residence addressis automatically obtained using a social security number supplied by theapplicant.

The model 117 may also incorporate other component models. For example,in one embodiment, the model 117 is configured to receive an indicationof the income stated by the applicant, query a database to obtaininformation related to a source or sources of the stated income, anddetermine an employment profile corresponding to the income source orsources. The employment profile may be based at least partially on theobtained information. The model 117 may further determine arepresentative income reflective of incomes of others having acomparable employment profile and calculate a validity measurereflective of a degree of correspondence between the stated income andthe representative income. The information indicative of the source orsources of income may comprise one or more of business address, businesstelephone number, co-worker names, type of business, and business name.The employment profile may comprise at least one of the following:occupation, job position, length of experience, salary and location. Themodel 117 may determine a representative income by determining a rangeof incomes of others having a comparable employment profile. The rangemay be bounded by selected percentiles of a group of the others. In oneembodiment, the model 117 communicates with at least one third partysource of information and wherein the determining is based at least inpart on the third party information. Additional alternative embodimentsand details of model 117 are disclosed in U.S. patent application Ser.No. 11/864,606, filed on Sep. 28, 2007, the disclosure of which has beenincorporated by reference above.

Implementations/Alternative Embodiments

The various functional blocks 106, 110, 111, 112, 113, 115, 117, 119,122, 126, and 128 shown in FIG. 1A may be implemented in computerhardware (e.g., one or more computers, computer processors, or otherunits of computing machinery) programmed with executable code modules.The code modules may be stored on any type or types of computer storagedevices or computer-readable media (e.g., hard disk drives, optical diskdrives, solid state storage devices, etc.), and may embody (i.e., directthe computer hardware to perform) the various steps and functionsdescribed herein. In some embodiments, the various code modules of thesystem 110 may be distributed across multiple distinct computers orcomputing devices that are interconnected on a network, and whichcollectively operate as a risk assessment computing system or machine.The scores and other data generated by the various models, including thecombined models 112, may be stored by transforming the electrical,magnetic, or other states of physical storage devices. Althoughpreferably implemented in program modules, some components of the system110, such as specific models, may alternatively be implemented in-wholeor in-part in application-specific circuitry (e.g., an ASIC or FPGA) orother special purpose hardware.

It is to be recognized that depending on the embodiment, certain acts orevents of any of the methods described herein can be performed in adifferent sequence, may be added, merged, or left out all together(e.g., not all described acts or events are necessary for the practiceof the method). Moreover, in certain embodiments, acts or events may beperformed concurrently, e.g., through multi-threaded processing,interrupt processing, or multiple processors, rather than sequentially.Further, in some embodiments, certain components of the disclosedsystems may be omitted.

The steps of a method or algorithm described in connection with theembodiments disclosed herein may be embodied directly in hardware, in asoftware module executed by a processor, or in a combination of the two.A software module may reside in RAM memory, flash memory, ROM memory,EPROM memory, EEPROM memory, registers, hard disk, a removable disk, aCD-ROM, or any other form of storage medium known in the art. Anexemplary storage medium is coupled to the processor such that theprocessor can read information from, and write information to, thestorage medium. In the alternative, the storage medium may be integralto the processor. The processor and the storage medium may reside in anASIC. The ASIC may reside in a user terminal. In the alternative, theprocessor and the storage medium may reside as discrete components in auser terminal.

Conclusion

While the above detailed description has shown, described, and pointedout novel features of the invention as applied to various embodiments,it will be understood that various omissions, substitutions, and changesin the form and details of the device or process illustrated may be madeby those skilled in the art without departing from the spirit of theinvention. As will be recognized, the present invention may be embodiedwithin a form that does not provide all of the features and benefits setforth herein, as some features may be used or practiced separately fromothers. The scope of the invention is indicated by the appended claimsrather than by the foregoing description. All changes which come withinthe meaning and range of equivalency of the claims are to be embracedwithin their scope.

What is claimed is:
 1. A system for detecting and assessing lendingrisks, the system comprising: a computer system comprising one or morecomputing devices, the computer system programmed, via executable codemodules, to implement: a combined risk detection model for detecting andassessing data indicative of a plurality of risks in loan applicationdata, the combined risk detection model adapted to receive as input aplurality of input features extracted from two or more of a plurality ofrisk detection models, the plurality of risk detection modelscomprising: a fraud model that detects the presence of data indicativeof fraud in the loan application data, the fraud model configured togenerate a fraud model score that is indicative of the presence of fraudin the loan application data; a multi-component risk model that assessesrisks associated with a loan referenced by the loan application data,the multi-component risk model based at least in part on external datanot within the loan application data, the multi-component risk modelconfigured to generate a multi-component risk score indicative of riskassociated with the loan; and a default risk model that detects thepresence of data indicative of a risk of early payment default in theloan application data, the default risk model configured to generate adefault risk score indicative of the risk of early payment default,wherein the computer system is further programmed to: determinecombinability of the plurality of risk detection models based at leastpartly on evaluating predictive performance of scores from one or morecombinations of the risk detection models against historicaltransactions data; and extracting input features from the risk detectionmodels that are determined to be combinable, the input features createdfrom a mathematical combination of the scores and additional data,wherein the scores from the models determined to be combinable includeone or more of the fraud model score, the multi-component risk score, orthe default risk score; generate a composite risk score based at leastin part on the scores from the models determined to be combinable andthe extracted input features; and a score reporting module that reportsthe composite risk score generated by the combined risk detection model.2. The system of claim 1, wherein the input features are extracted fromone or more of: scores from the plurality of risk detection models oradditional data related to the loan application not output by theplurality of risk detection models.
 3. The system of claim 2, whereinthe additional data related to the loan application not output by theplurality of risk detection models comprises a loan amount.
 4. Thesystem of claim 1, wherein the score reporting module further reports aplurality of risk indicators generated by the plurality of riskdetection models.
 5. The system of claim 4, wherein each of the riskindicators references a loan risk and is classified in accordance with aweight contribution of the referenced loan risk to the composite riskscore.
 6. The system of claim 5, wherein each of the risk indicators isclassified as a high risk, a moderate risk, or a low risk based on theweight contribution of the referenced loan risk.
 7. The system of claim1, wherein a modeling method used to construct the combined riskdetection model comprises one of: linear regression, logical regression,neural networks, support vector machines, or decision trees generatedusing a machine learning algorithm that uses a tree-like graph topredict an outcome.
 8. The system of claim 1, wherein the combined riskdetection model comprises one or more of the following modelingstructures: a cascade structure in which scores of the risk detectionmodels are ranked in a specified order, and at least a first riskdetection model score is used to generate a first stage score, and atleast the first stage score and a second risk detection score are usedto generate a second stage score; and a divide-and-conquer structure inwhich a combination gate incorporates scores from the plurality of riskdetection models with interactive input fields to generate the compositerisk score.
 9. The system of claim 1, wherein the evaluating of thepredictive performance of the risk detection models includes acorrelation analysis or a swap analysis of the scores generated by therisk detection models, wherein the correlation analysis determinessimilarity among the scores from the risk detection models and the swapanalysis determines overlap of a review population based on outsortinglogic of the scores of the risk detection models.
 10. The system ofclaim 1, wherein the input features correspond to results ofmathematical operations on the scores from the plurality of riskdetection models, the mathematical operations further comprising:obtaining a maximum of the scores, obtaining a minimum of the scores,obtaining an average of the scores, and obtaining a dynamic range of thescores, and obtaining a ratio of the dynamic range over the average. 11.The system of claim 1, wherein the multi-component risk model outputs aplurality of component scores comprising one or more of: a collateralcomponent score, a broker component score, a borrower component score,and a market component score.
 12. The system of claim 11, wherein theone or more component scores are calculated based on external data notwithin the loan application data, the external data comprising datacontributed by lenders, credit data, and public records data.
 13. Acomputerized method of detecting and assessing mortgage lending risks,the method comprising: receiving, on a physical computer processor,mortgage transaction data that includes loan application data andhistorical transactions data; determining, on a physical computerprocessor, combinability of a plurality of risk detection models, thedetermining comprising determining predictive performance of scores fromone or more combinations of the risk detection models as compared to thehistorical transactions data, wherein the plurality of risk detectionmodels further comprise two or more of: an income fraud detection modelthat detects the presence of data indicative of fraud in income datawithin the mortgage transaction data, the income fraud detection modelgenerating an income fraud score indicative of fraud in the income data;a default risk model that detects a risk of payment default in themortgage transaction data, the default risk model generating a defaultrisk score indicative of the risk of payment default; and amulti-component risk model that detects risks in loan transactions, themulti-component risk model further comprising scores from one or more ofthe following components: a collateral component, a broker component, aborrower component, and a market component, the multi-component riskmodel generating a multi-component risk score indicative of risk in themortgage transaction data; extracting, on a physical computer processor,input features from the risk detection models that are determined to becombinable, the input features created from a mathematical combinationof the scores and additional data; applying, on a physical computerprocessor, a combined risk detection model to the mortgage transactiondata to generate a composite risk score, wherein the combined riskdetection model is configured to receive the extracted input featuresfrom the plurality of risk detection models determined to be combinableand mortgage transaction data not output by the risk detection models,the extracted input features further being selected as based at least inpart on a modeling method used to construct the combined risk detectionmodel; generating, on a physical computer processor, a report includingthe composite risk score generated by the combined risk detection model.14. The method of claim 13, determining the combinability of theplurality of risk detection models is based at least in part on thecorrelation of the results of applying the plurality of risk detectionmodels to historical transaction data.
 15. The method of claim 14,wherein the correlation is based at least in part on a measure of thesimilarity of the results among the plurality of risk detection models.16. The method of claim 13, wherein the modeling method is one of:linear regression, logical regression, neural networks, support vectormachines, and decision trees generated using a machine learningalgorithm that uses a tree-like graph to predict an outcome.
 17. Themethod of claim 13, wherein the input features are selected by: applyingeach of the plurality of risk detection models to data to generate ascore for each risk detection model, the data comprising historicalmortgage transaction data; identifying an interaction among scores fromthe plurality of risk detection models; and using the interaction as abasis for the selection of the input features.
 18. The method of claim13, wherein the input features are selected by: applying each of theplurality of risk detection models to data to generate a score for eachrisk detection model, the data comprising historical mortgagetransaction data; performing a swap analysis on the scores from applyingthe plurality of risk detection models to the data; and using the resultof the swap analysis as a basis for the selection of the input features.19. The method of claim 13, wherein the generating further provides aplurality of risk indicators generated by the combined risk detectionmodel.
 20. The method of claim 19, wherein each of the risk indicatorsreferences a loan risk and is classified in accordance with a weightcontribution of the referenced loan risk to the composite risk score.21. The method of claim 20, wherein each of the risk indicators isclassified as a high risk, a moderate risk, or a low risk based on theweight contribution of the referenced loan risk.
 22. The method of claim13, wherein the default risk model detects a risk of payment defaultoccurring in one of the following periods: within the first 90 days of aloan repayment period, within the first six months of the loan repaymentperiod, and within the first year of the loan repayment period.
 23. Amethod for creating a model for detecting loan risks, the methodcomprising: receiving loan transaction data linked with historical loanperformance data related to risks including fraud and default risks;executing a plurality of risk detection models on the loan transactiondata to obtain a respective model score from each of a plurality of riskdetection models, wherein the plurality of risk detection models furthercomprise two or more of: a fraud model that detects the presence of dataindicative of fraud, the fraud model derived from loan transaction data,the fraud model generating a fraud score indicative of the presence offraud in the loan transaction data; a multi-component risk model thatdetects risks in loan transactions, the multi-component risk model basedat least in part on data external to the loan application data includingcredit data and public records data, the multi-component risk modelgenerating a multi-component risk score indicative of the risk in theloan transaction data; and a default risk model that detects thepresence of data indicative of a risk of early payment default, thedefault risk model generating a default risk score indicative of therisk of early payment default; determining the combinability of theplurality of risk detection models based at least partly on evaluatingpredictive performance of scores from one or more combinations of therisk detection models against the historical loan performance data;extracting input features from the risk detection models that aredetermined to be combinable, the input features created from amathematical combination of the scores and additional data, wherein thescores from the risk detection models determined to be combinableinclude one or more of the fraud score, the multi-component risk score,or the default risk score; and constructing a combined risk detectionmodel based on the extracted input features, the receiving, executing,determining, extracting, and constructing performed by a computer systemthat comprises one or more computing devices.
 24. The method of claim23, wherein the constructing further comprises: selecting one or moremodel structures for the combined risk detection model; selecting amodeling method for the combined risk detection model; training thecombined risk detection model on the loan transaction data; andevaluating the performance of the combined risk detection model, theselecting one or more model structures, selecting a modeling method,training, and evaluating performed by a computer system that comprisesone or more computing devices.
 25. The method of claim 24, wherein themodeling method is one of: linear regression, logical regression, neuralnetworks, support vector machines, and decision trees generated using amachine learning algorithm that uses a tree-like graph to predict anoutcome.
 26. The method of claim 24, wherein the one or more modelstructures comprises: a cascade structure in which scores of the riskdetection models are ranked in a specified order, and at least a firstrisk detection model score is used to generate a first stage score, andat least the first stage score and a second risk detection score areused to generate a second stage score; and a divide-and-conquerstructure in which a combination gate incorporates scores from theplurality of risk detection models with interactive input fields togenerate the composite risk score.
 27. The system of claim 1, whereinthe input features are extracted from the two or more risk detectionmodels by mathematically combining scores from the plurality of riskdetection models for input into the combined risk detection model, theinput features being selected as based at least in part on a comparisonof the predictive performance of the risk detection models and aselection of a modeling method used to construct the combined riskdetection model.
 28. The system of claim 1, wherein the plurality ofrisk detection models further comprises an income fraud detection modelthat detects the presence of data indicative of fraud in income datawithin the mortgage transaction data, the income fraud detection modelconfigured to generate an income fraud score indicative of fraud in theincome data.